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Aa  algorithm  for  synchronising  concurrent  operations  oa 
extendible  hash  Stas  is  presented.  Ths  algorithm  is  deadlock  free 
and  allo'va  ths  search  operations  to  proceed  concurrently  with 
insertion  operations  without  haring  to  acquire  locks  oa  ths  direc¬ 
tory  entries  or  the  data  pages.  It  also  allows  concurrent 
inaertioa/delettoa  operations  to  proceed  without  haring  to  acquire 
locks  on  ths  directory  entries.  The  algorithm  is  also  unique  in  that 
it  combines  ths  notion  of  verification,  fundamental  to  the  optimis¬ 
tic  concurrency  control  algorithm,  and  the  special  and  known 
semantics  of  the  operations  in  extendible  hash  Hies.  A  proof  of 
correctness  for  the  proposed  algorithm  is  also  presented. 


1.  Introduction 

The  concurrency  control  algorithm  in  a  conventional  data¬ 
base  management  system  enforces  serialiaability  of  transactions 
(Papsdimitriou7B|.  Each  transaction  is  normally  modeled  as  a 
sequence  of  read  and  write  steps,  and  the  concurrency  control  algo¬ 
rithm  enforces  serialiaability  without  assuming  much  knowledge  of 
the  semantics  of  the  read  and  write  steps  of  the  transactions. 
While  this  Isvei  of  generality  enables  the  concurrency  control  algo¬ 
rithm  to  be  applicable  to  any  transaction  system,  it  does  not  take 
advantage  of  the  structures  inherent  in  the  applications  to  optimise 
for  higher  level  of  concurrency  and  lower  synchronisation  over¬ 
head. 

In  recent  yean  specialised  concurrency  control  sigorithmn 
that  take  advantage  of  the  knowledge  of  the  stinc Hire  and/or  the 
semantics  of  rranaai  leme  have  appeared  [e^„  SK80.  KS83,  KP78, 
HMtS.  HCS5,  O'NVeiMl.  la  pareacniar,  much  attantaon  bee  been 
paid  to  the  optimisation  of  algorithms  that  synchronise  concurrent 
operations  on  B- trees  ie.g..  BS77,  LY81,  MR85|. 

In  this  paper  see  present  aa  algorithm  that  synchronise  con¬ 
current  operations  on  a  file  structured  uamg  extendible  bashing 
TNPS78I.  Extendible  hashing  a  a  form  of  dynamic  hashing  which 
adaptively  updates  a  directory  of  pointers  to  data  bucket,  or  data 
pages.  Since  the  directory  entries  are  subject  to  update  at  any 
moment,  a  search  operation  would  normally  be  required  to  obtain  a 
lock  on  the  directory  entry  it  reads  to  prevent  the  directory  entry 
from  being  inadvertently  changed.  However,  by  exploiting  the 
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i  known  semantics  of  the  act  eases  to  the  directory  entries,  it  ia  con¬ 
ceivable  that  one  can  deviae  concurrency  control  algorithms  that 
minimise  such  overhead. 

Wt  present  a  concurrency  control  algorithm  that  allows  the 
•earth  operation  in  an  extendible  hash  file  to  proceed  without  hav¬ 
ing  to  Mt  locks  on  the  directory  entries.  We  also  allow  concurrent 
insertions  to  be  lynchroniaed  with  a  mechanism  which  is  simpler 
and  potentially  able  to  offer  a  higher  degree  of  concurrency. 

Tbs  algorithm  ia  also  unique  in  that  it  utilises  the  general 
mechanism  behind  the  optimistic  concurrency  control  algorithms 
[KR8 1|.  By  making  use  of  ecri/Scafiee  at  tht  right  moment,  opera¬ 
tions  are  guaranteed  a  consistent  view  of  the  data  structures 
required  to  insure  their  correctness  while  minimising  the  locking 
overhead. 

The  structure  of  ths  paper  is  as  follows.  In  ths  next  section, 

.  the  general  mechanism  of  the  extendible  hashing  scheme  is 
:  reviewed.  In  Section  three,  we  present  our  concurrent  search  and 
insertion  algorithms,  followed  by  a  proof  of  correctness  in  Section 
four.  Section  five  concludes  ths  paper  and  presents  a  discussion  of 
future  extensions. 

3.  Review  of  Extsndlbln  Hashing 

Extsndibls  hashing  [FNPS78|  is  a  fils  structuring  snd  ee arch¬ 
ing  technique  in  which  ths  user  is  guaranteed  no  more  than  two 
pegs  sceassaa  Co  locate  the  data  amociaied  with  a  given  key. 
Unlike  conventional  hashing,  extendible  bashing  has  a  dynamic 
structure  that  grows  and  shrinks  gracefully  aa  ths  database  grows 
.  ihnoJcs. 

The  file  rnnsmte  of  a  directory  (D)  and  data  pages.  The 
directory  is  characterised  by  a  f«ee*i  depth  g,  and  contains  2* 
entries,  each  of  which  points  to  a  data  page.  The  hash  function,  h. 
transforms  ths  keys  of  the  key  set  into  a  ’pseudo  key'  of  s  bit 
form;  the  first  g  bits  of  ths  pseudo  key  determine  the  directory 
entry  corresponding  to  a  ksy.  Each  data  pagt  is  characterised  by 
a  /seaf  depth  /<},  snd  a  bit  pattern  ip  of  length  I.  A  data  page 
with  an  f-bib  bit  pattern  ip  contains  ail  keys  the  first  I  bits  of 
whose  pseudo  keys  conform  to  the  bit  pattern  ip  When  a  data 
page  overtiowe,  its  local  depth  is  incremented  by  1  snd  the  page  is 
split  in  two:  one  page  is  now  characttnstd  by  a  bit  pattern  which 
is  the  old  bit  pattern  concatenated  with  an  additional  bit  of  T)'  and 
the  other,  with  the  bit  of  ’1’. 

Example.  Consider  ths  state  of  an  extendible  hash  file  aa 
shown  in  Figure  2.1.  Currently  there  are  very  few  records  with 
pseudo  keys  that  begin  at  V.  All  such  records  are  collected  into  a 
single  data  page  whose  local  depth  la  1  and  whose  1-bit  bit  pattern 
ia  ’1’.  Whsn  the  page  becomes  full,  as  shown  in  Figure  2.2,  it  splits 
into  two  data  pages,  each  with  local  depth  of  2;  one  data  page  now 
haa  a  bit  pattern  of  '10’  and  ths  other  'll’.  AJ1  keys  whose  pseudo 
keys  begin  si  '10'  appear  in  the  first  of  these  data  pages,  and  all 
keys  whose  pseudo  keys  begin  at  ’ll’  appear  in  the  other. 
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When  th*  date  page  whoa*  local  depth  ia  equal  to  th*  global 
dapth  of  th*  directory  overflow*,  th*  directory  sic*  ia  doubled,  i.e., 
th*  (lobal  depth  is  incremented  by  1,  and  the  overflowing  data 
page  i*  again  allowed  to  split.  For  example,  if  we  start  with  the 
situation  a*  shown  in  Figure  2.2,  and  if  the  data  page  pointed  to 
by  th*  '010'  pointer  is  already  full,  then  th*  directory  is  doubled 
and  th*  page  split*,  as  shown  in  Figure  2.2.  (Figures  2.1  to  2.3  are 
taken  from  Figure*  8  to  10  in  (FNPS79|.) 


Th*  extendible  hashing  scheme  uses  a  contiguously  allocated 
directory  whoa*  sis*  changes  by  factors  of  two.  It  enable*  direct 
access  to  th*  right  data  page  (or  bucket).  No  overflow  area  is 
used.  In  |FNPS79|,  it  is  shown  that,  in  the  case  where  th*  bucket 
(page)  sit*  is  *00  and  th*  sit*  of  th*  key  set  is  40,000,  th*  storage 
utilisation,  on  th*  average,  is  about  89%. 
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3.  Concurrent  Operations  In  Extendible  Hashing 

In  this  section  we  describe  the  algorithm  of  oor  concurrent 
operations  in  extendible  hash  files.  Throughout  we  wilt  ignore  the 
issue  of  underflow  and  compaction.  In  other  words,  the  number  of 
pages  of  the  file  only  grows  and  never  shrinks.  The  compaction 
issue  was  also  ignored  in  (LY81j  and  is  generally  justified  by  the 
observation  that  databases  tend  to  grow  and  the  utility  of  the 
storage  recovered  from  on-line  real-time  compaction  may  not  be 
worth  the  trouble.  Compaction  can  be  handled  by  taking  the 
database  offline  for  a  reorganisation. 

3.1.  Search  Algorithm 

Tbs  search  operation  on  an  extendible  hash  file  consists  of  (1) 
applying  the  bash  function  to  obtain  a  pseudo  key,  (3)  examining 
the  first  ;  bits  of  the  pseudo  key  to  determine  the  directory  entry 
to  be  read,  (3)  reading  the  directory  entry  to  find  a  pointer  to  the 
data  page  to  be  searched,  and  (4)  searching  in  the  data  page  to 
find  the  key  desired. 


What  the  search  operation  is  vulnerable  to  is  the  concurrent 
insertion  operation  that  splits  a  data  page  and  relocates  a  range  of 
the  keys  that  include  the  key  desired  by  the  search  operation. 
This  type  of  interference  can  be  eliminated  by  requiring  the  search 
and  the  insertion  operations  to  obtain  a  lock  on  the  directory  entry 
and  hold  it  until  the  operation  ends.  In  our  search  algorithm,  how¬ 
ever,  this  type  of  interference  is  avoided  by  re-reading  the  direc¬ 
tory  entry  when  s  search  operation  could  »«<  find  the  key  in  the 
data  page  it  baa  just  read,  without  having  to  hold  any  lock  on  the 
directory.  This  form  of  re-reading,  or  verification,  continues  until 
either  the  key  is  found,  or  the  value  of  the  directory  entry  does  not 
change  between  two  consecutive  readinp.  The  algorithm  is  for¬ 
mally  defined  shortly. 

Intuitively,  the  search  algorithm  attempts  to  sert/y  the  direc¬ 
tory  entry  it  has  previously  read  before  it  would  conclude  a  search 
failure.  It  the  content  of  the  directory  entry  baa  changed  in  the 
mean  Urns,  the  search  operation  automatically  retries  with  the  new 
pnintsr  obtained.  A  formal  proof  of  correctness  of  the  algorithm  is 
presented  in  Section  4. 


Usjlsifue  e/  (he  Search  hifsndst 

Algorithm  Scare h( given  key  k)g 
begin 

initielisation: 

xold:«0: 

hashing; 

calculate  k’  -  h(k)-  1,3,  ■  •  • 
ptpoiater 

read  d,  bane  ;  /*  the  global  depth  and  bast  address  of  tht  directory  D  */ 

t  Mt  •  •  -  *e _ ■;  /*  take  the  initial  d  bits  of  k*  */ 

x  get(D(tfl;  /*  D(t)  is  the  t-tb  entry  i#  D  */ 
probe: 

do  while  x  ri  sold; 

A  getfx);  /•  read  a  data  pap  */ 

if  key  k  in  A  then  'success',  rstarn(x);  /*  ends  search  */ 

xold  x; 

x  r—  pt(D(t));  /*  re-read  directory  */ 
end; 

return  ('search  fails'); 
end; 


3.2.  incaT-Tion  Algorithm 

The  insertion  operation  in  an  extendible  bash  file  consists  of 
(1)  applying  the  naming  function  to  the  key  to  obtain  the  pseudo 
key,  (2)  examine  the  first  t  bile  of  the  poeoda  key  to  determine  the 
directory  entry  to  be  read.  (3)  reading  the  directory  entry  to  obtain 
a  pointer  to  a  data  page.  (4)  reading  the  data  page  to  search  for 
the  existence  of  the  same  key,  and  (S)  inserting  the  key  in  the  data 
page,  if  the  key  docs  not  already  exist.  When  inserting  the  new 
key,  if  the  data  pap  is  full,  then  a  split  is  performed,  reeuiting  in  a 
new  data  pap  to  be  created  and  at  least  one  directory  entry  to  be 
updated.  For  now  we  will  ignore  the  issue  of  directory  expansion 
(i.s.,  doubling  in  silt).  Wt  mil  revisit  this  issus  britfly  in  tbs  final 
section  of  this  paper. 

Two  insertion  operations  may  interfere  even  when  they  art 
inaerting  different  keys.  Undesirable  interference  may  be  elim¬ 
inated  by  requiring  the  insertion  operation  to  bold  locks  on  both 
the  directory  entries  and  the  data  pap  that  it  updates  till  the  end 
of  the  operation.  In  our  alpnthm,  however,  the  need  to  hold  locks 
on  tbs  directory  entries  is  svoidsd  by  requiring  the  insertion  opera- 
teon  to  perform  ven^csliec  of  the  content  of  the  directory  entry  it 
hss  previously  read  e/ler  locking  the  data  pap  and  ic/erc  perform¬ 


ing  updates  on  the  data  pap.  If  verification  fads,  the  operation 
would  unlock  the  pap  and  lock  a  different  one.  and  perform 
toother  verification.  The  insertion  operation  never  blocxs  once  its 
first  lock  is  granted,  therefore  deadlock  is  eliminated. 

In  handling  splitting,  our  algorithm  requires  that  the  newly 
allocated  page  be  locked  until  the  affected  directory  entry(entnes) 
iafare)  updated.  Inherent  in  the  dynamic  hashing  algorithm, 
however,  is  tbs  complication  that  when  a  key  *  is  to  be  inserted 
into  s  pap  which  is  already  full,  one  split  may  not  be  enough. 
When  splitting  occurs,  tbs  local  depth  of  the  splitting  pap  is  incre- 
ratnted  by  one  and  s  new  page  is  allocated  in  the  database.  Tbs 
original  key  rang*  in  tbs  splitting  page  is  divided  in  half,  with  the 
higher  half  distributed  into  the  new  page  and  tbs  lower  half 
retained  in  the  splitting  pags.  One  of  these  two  pages,  say  p ,  now 
contains  the  key  ranp  that  includes  k.  It  is  noted  that  in  extreme 
cssce  p  may  be  full  again  before  k  is  inserted.  This  occurs  whtn 
ail  tbs  existing  reeords  in  the  splitting  pags  are  all  hashed  into  the 
halved- key- range  that  contains  k.  When  this  occurs,  p  needs  to  be 
split  again  before  k  can  be  inserted.  This  process  must  continue 
until  k  flnaily  falls  in  s  pegs  which  is  not  full.  However,  the 
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number  of  split*  required,  and  therefor*  the  number  of  new  PK« 
Qltd  U>  be  allocated  to  allow  k  U>  be  inserted,  can  be  determined 
from  the  content*  of  th*  splitting  pag*  when  it  is  Hut  examined. 
We  will  denote  thie  number  to  be  e.  In  general,  «  range*  from  0 
to  logj^"44)—  I,  where  d  ia  the  global  depth  and  id  i»  the  local 
depth  of  the  splitting  page  before  splitting. 

The  wey  our  concurrent  insertion  algorithm  deals  with  the 
above  complication  is  to  (1)  have  th*  splitting  page  as  well  as  sd 
th*  newly  allocated  pages  ia  th*  ditab***  locked,  (2)  rearrange 
content*  of  these  pages  ia  private  work  space  and  allowing  *  to  be 
inserted,  (3)  writ*  th*  newly  allocated  pages  back  to  th*  database, 
(4)  update  all  th*  affected  directory  entries,  (S)  unlock  all  new 
pages,  (8)  write  th*  splitting  pag*  back  to  th*  database,  and  finally 
(7)  unlock  th*  splitting  pag*.  On*  may  cboos*  to  combine  step*  (5) 
and  (7)  together  as  th*  last  step,  but  that  is  not  strictly  necessary. 
Note  that  during  th*  entire  operation  no  directory  entries  are 
locked  sad  aU  search  operation*  proceed  without  being  blocked,  in 
particular,  in  step  (4)  above,  when  multiple  directory  entries  are 
updated,  they  are  updated  on*  by  one  without  having  to  be 
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updated  ail  in  on*  atomic  action.  It  ia  assumed,  however,  that 
updating  any  single  directory  entry  ie  atomic,  as  well  as  writing 
any  tingle  data  page  to  the  database. 

Wt  provide  the  definition  ol  our  insertion  algorithm  below, 
and  the  formal  proof  of  correctness  is  presented  in  Section  4. 

S.3.  Delation  Algorithm 

A  deletion  operation  in  an  extendible  hash  file  consitu 
roughly  of  ths  tame  set  of  steps  as  the  Insertion  operation,  except 
that  it  needs  not  to  deal  with  th*  issue  of  overflow  and  pags  split' 
ting.  For  our  purpose,  as  mentioned  in  the  beginning  of  this  tec* 
lion,  we  will  ignore  th*  issue  of  underflow  and  compaction.  There¬ 
fore  eyntaetically  a  deletion  operation  is  just  like  sn  insertion 
operation  that  does  not  encounter  overflow.  For  brevity,  w*  do 
sot  include  *  formal  deflnitioo  of  it*  algorithm. 


Algorithm  lnssrt( gives  key  k); 
begin 
hashing: 

Calculate  k’  -  h<k)-  Ml  '  •  • 
gttpointer 

read  d,  bam;  /*  th*  global  depth  and  bam  address  of  th*  directory  D*/ 

t  Mi  '  •  ’  b* _ ii  /*  take  th*  initial  d  bits  of  k'  */ 

x  g*t(Dlt));  I'  Djtj  it  th*  t-th  entry  in  D  */ 
lock_and_  verify: 
xoid  x; 
lock  (x); 

X  mr  g*t(D(t|);  /*  re-read  directory  entry  */ 
do  while  xoid  «*  x;  /•  veriOcation  loop  */ 
unlockfx;; 
xoid  :«*  x; 

x  g*t<D(t));  r  re-read  •/ 
lock  (x); 
end; 
probe: 

A  —  getfx— p);  /*  read  data  pag*  p  painted  to  by  x  */ 
if  key  k  in  A  then  'error  duphcauoa',  return; 

cam  1. 1AJ  <  e  /*  no  need  to  split,  where  c  ie  tbs  capacity  of  a  pag*  */ 
A  m  pag* insert  (AJt); 

cam  2. 1A1  “  e  /*  split  required:  worn  no  directory  doubling  */ 
a  :■  Qamocr  of  ntw  paipM  piqontd; 

Vultr-lt  :**  lilormu  a  now  p&^w  m  diubut: 
lock  (»i,y*.,..I.);  /*  keep  new  pages  locked  •/ 

A.  B w  rearrange  old  A  and  B’s,  adjust  l.d,  insert  k: 
for  t  —  l  to  n  do; 

put  •*);  /*  writ*  B's  into  database  */ 

end; 

directory  .modify<D,V|,..,g.); 
unlock  (y„_,y.); 
pu*(A,  x— p); 
unlock(x); 
end; 

The  function  of  directory. modify  is 

Procedure  dire*  tory.modifyfD.y, 
begin 

for  sil  directory  entries  j  affected  by  split  do; 
i  :•  jubeerrpt  of  newty  allocated  page  containing  key  range  of  entry  j; 
put  (lb.  D(j|); 
end; 
end; 


vvv.v.  / 


3.4.  Discussion  of  Performance 

In  this  subsection  w«  briefly  discuss  how  our  propoaad  algo¬ 
rithm  compart*  with  "standard  techniques’.  To  our  beat 
knowledge,  there  haa  been  little  diacuaaion  of  concurrent  operation# 
in  extendible  haahing  in  the  literature.  Therefore  we  will  ajaume 
the  "standard  technique"  in  thia  caae  to  be  two- phase  locking 
(3PL).  Using  2PL,  a  search  operation  must  (1)  obtain  a  shared- 
lock  on  the  directory  entry,  (3)  obtain  a  shared-lock  on  the  data 
page  pointed  to  by  the  directory  entry,  (3)  perform  search  and 
then  release  both  locks.  An  inaertion/deletion  operation  must  (1) 
obtain  an  exclusive-lock  on  the  directory  entry,  (3)  obtain  an 
exclusive-lock  on  the  data  page  pointed  to  by  the  directory  entry, 
and  13)  perform  updates  and  release  both  locks.  If  the  insertion 
encounters  the  need  to  split  the  data  page,  it  must  additionally 
acquire  exclusive  locks  on  all  directory  entries  affected  by  the  split 
before  updating  these  entries  and  before  releasing  sap  lock  that  it 
haa  acquired. 

We  Smt  show  that  the  standard  technique  is  prone  to 
deadlocks.  Consider  two  adjacent  directory  entries  dt  and  dt 
pointing  to  the  same  data  page  p  where  p  currently  has  a  local 
depth  which  is  1  leas  than  the  global  depth.  Two  insertion  opera¬ 
tions  /,  and  It  are  run,  one  with  a  pseudo  key  mapped  to  d,  and 
the  other  to  d+  Consider  the  following  interleaved  execution 
sequence  using  the  standard  technique: 

/,  locks  d,; 

It  locks  dti 
/,  locks  p; 

1 1  reads  p  and  encounters  overflow; 
fi  attempts  to  lock  d* 

It  attempts  to  lock  p; 

The  two  operations  are  now  deadlocked. 

Also,  using  the  standard  technique,  while  a  search  operation 
is  never  blocked  by  another  search  operation,  it  may  be  blocked  by 
an  insertion  operation,  and  vice  vena.  In  our  algorithm,  a  search 
operation  is  never  blocked  by  an  mortem  operation.  Furthermore, 
in  our  algorithm,  insertion  operations  do  not  have  to  acquire  a  lock 
on  the  directory  entry  before  reading  it,  resulting  in  savings  in 
locking  overhead.  The  exact  nature  of  the  performance  of  the 
algorithm  as  compared  to  the  standard  technique  would  require 
additions!  anaxyme. 

While  the  proposed  algorithm  offers  freedom  from  deadlocks, 
potentially  higher  level  of  coucuiieucy  and  suvmgs  in  locking  over¬ 
head.  it  a  conceptually  simple  and  should  be  juet  me  easy,  if  not 
earner,  to  implement.  The  only  additional  coat  in  the  propoead 
algorithm  is  the  coat  of  venfleauon.  The  search  operation  is  poten¬ 
tially  required  to  perform  verification  of  the  content  of  the  direc¬ 
tory  entry  previously  read.  This  verification  is  needed  only  when 
the  key  desired  is  not  found.  The  insertion  algorithm  is  always 
required  to  perform  verification.  However,  it  can  be  argued  that, 
when  a  verification  is  performed  on  s  directory  entry,  the  likeli¬ 
hood  that  the  latter  is  memory-resident  (i.e.,  in  the  buffer  pool)  is 
very  high.  This  is  true  even  if  one  does  not  in  general  keep  the 
entire  directory  in  memory.  Therefore  the  cost  of  verification  due 
to  re-reading  the  directory  entries  is  but  a  few  memory  sceeaes, 
and  can  be  largely  ignored. 


4.  Proof  of  Correctness 

To  show  that  the  above  algorithm  is  correct,  ere  use  the  fol¬ 
lowing  steps: 

(1)  Show  that  ail  operations  are  deadlock-free  and  will  terminate. 
(3)  Show  that  the  search  operation  la  correct. 

(3)  Show  that  the  insertion /deletion  operation  ia  correct. 


Assumptions: 

(1)  The  database  ia  Anita  in  aise.  In  other  words,  there  exists  a 
bound  on  the  global  depth. 

(3)  Each  search/insertion/deietion  operation  consists  of  a 
sequence  of  read  and  writs  steps.  Each  read/writs  step 
involves  a  dels  jrenrtle  which  is  either  s  directory  entry  or  s 
data  pa  pa.  W«  assume  that  each  read  and  write  etep  on  each 
data  yranata  it  jaarenteed  to  is  domic  by  tht  underlying 
system,  on  top  of  which  tbs  current  algorithms  are  imple¬ 
mented.  In  other  words,  we  assume  that  the  yet  and  pat  steps 
in  the  definition  of  ths  algorithm  are  atomic  steps.  Note  that 
thia  assumption  can  be  supported  by  s  synchronisation 
mechanism  at  a  lower  level  if  neceaaary. 

In  order  to  provide  a  proof  of  correctness  tbs  criterion  of 
correetnem  must  first  be  articulated.  We  first  give  the  following 
definitions  before  we  discuss  the  criterion  of  correetnem. 

Definition.  A  eekednla  is  a  sequence  of  steps,  each  of  which 
is  in  the  form  of  A,(OP).  The  action  A  tan  be  read  (R)  or  writs 
(W).  The  data  granule  is  s,  which  can  either  be  a  directory  entry, 
denoted  as  d,  or  a  data  page,  denoted  as  p.  OP  ia  an  operation, 
which  may  eithar  be  a  march  operation,  denoted  m  S,  which  con¬ 
sist*  of  two  sup*  Rt  and  R„  or  an  insertion /deletion  operation, 
denoted  as  I,  which  consists  of  at  least  three  steps,  R,,  R,  and 
(Additional  Wf  and  W{  may  also  appear  in  an  insertion  operation.) 
An  operation  can  also  be  denoted,  together  with  the  key  k  of  the 
record  to  b*  operated  on,  sc  S(i)  or  1(h). 

Example.  An  example  of  a  schedule  ia 

in  which  three  operations  SJ  and  /  art  involved. 

Definition.  Let  A  and  A’  be  two  steps  in  e  schedule.  We  say 
that  A  <  A'  if  A  occurs  before  A'  ia  the  schedule. 

Example.  In  the  shove  example  schedule,  Ri(/)<Wri(r). 

Criterion  of  Comatnooa.  The  unit  of  atomicity  used  for  the 
purpose  of  defining  correetnem  ia  the  operation.  In  other  words,  the 
algorithm  ia  correct  if  any  inieriamved  schedule  C  that  the  algo¬ 
rithm  allows  ia  sgweeient  (La,  keenly  tka  same  ntt  ijftci)  to  tema 
serialised  execution  SE  of  the  same  set  of  operations,  subject  to  an 
additional  rmtnctioa  to  be  described  in  the  next  paragraph.  The 
notion  of  "having  the  same  set  effect*  is  denned  as  follows:  if  a 
search  operation  fallal  succeeds  with  record  r)  in  C  it  siao 
failefracesede  with  record  r)  in  SE.  end  if  an  insertion /  deletion 
open  non  ncceednf  fails)  in  C  it  also  succeeds!  fails)  in  SE. 

We  fim  inutirata  the  additional  restriction,  followed  by  the 
formal  definition  of  the  criterion  of  correetnem.  It  is  spurious  to 
consldtr  an  interleaved  schedule  C  correct  if  it  results  in  s  failure 
of  s  search  operation  (i.e.,  the  search  operation  does  not  find  the 
key  it  is  looking  for)  while  the  search  operation  starts  in  C  after  an 
insertion  operation  that  inserts  that  key  has  finished  as  lest  step. 
For  example,  cr -aider  an  interleaved  acneduie  C  — 
<...,W,[I),...Jir(S),..>  and  assume  that  I  inserts  key  k  in  page  p 
and  Wf(/)  is  its  last  step,  S  searches  for  key  k  and  fails,  and  no 
dslstion  operation  is  involved  in  this  schedule.  While  one  may  find 
ths  net  result  of  schedule  C  equivalent  to  that  of  a  serialised  exe¬ 
cution  where  S  is  run  before  I,  it  is  meaningless  to  consider  C 
correct.  Therefore  we  define  s  mors  meaningful  and  more  intuitive 
criterion  of  correetnem,  while  retaining  the  basic  notion  of  atomi¬ 
city  at  the  operation  level,  as  follows: 

A  schedule  C  of  an  interleaved  execution  of  s  set  of 
search  and  insertion/deittion  operations  is  correct  if  the 
net  effect  of  C  is  equivalent  to  some  serialised  execution 
SE  of  the  same  set  of  operations  s.t.  if  the  last  step  of 
OP,  is  before  the  first  step  of  OPj  in  C  then  OP,  is 
before  OPj  in  SE. 


4.1.  Proof  of  Termination 

Ltmmt  1.  Ail  operation*  terminate. 

Proof.  Since  no  operation  would  hold  any  lock  while  weiViny 
for  another,  no  circular  wait-for  ia  possible,  therefore  no  deadlock  is 
possible.  Therefore  the  termination  proof  amounts  to  proving  that 
the  potential  loop  in  the  operation  will  terminate.  All  operations 
potentially  involve  a  loop  of  re-reading  a  directory  entry.  Given  an 
operation  0  that  involves  such  a  loop,  the  loop  in  O  terminates 
when  the  content  of  the  last  directory  entry  read  is  the  same  as 
that  of  tbs  previous  directory  entry  read.  The  content  of  any 
directory  entry  would  change  only  when  a  split  occurs  in  the  data 
page  that  the  directory  entry  points  to.  Since  the  number  of  times 
that  any  data  page  can  split  is  bounded  by  the  logj(/V),  where  N  is 
the  maximum  number  of  pages  allowed  in  this  system,  i.e.,  it  is 
bounded  by  the  maximum  global  depth  of  the  system,  the  number 
of  time*  the  value  of  a  directory  entry  will  change  is  bounded  by 
logoff).  Therefore  the  loop  of  re-reading  the  directory  entry  ia  0 
will  terminate. 

4.2.  The  Search  Operation  la  Correct 
Ltmmt  t.  The  search  operation  is  correct. 

Prttf.  To  prove  that  starch  operation*  are  correct,  we  inves¬ 
tigate  what  could  possibly  be  the  eases  for  it  to  be  incorrect. 
Since  all  search  operations  terminate,  they  either  succeed  or  fail. 
We  consider  each  of  these  two  cseee  separately. 

(i)  If  a  search  operation  S  succeeds,  i.e.,  if  it  Snds  the  key  it 
is  looking  for.  then  it  must  be  correct.  This  can  be  shown  as 
follows.  Suppose  the  record  it  finds  is  r.  Then  there  must  exist  sa 
insertion  operation  1  that  inserts  r.  We  can  construct  an 
equivalent  serialised  execution  in  which  I  is  before  S.  If  there  aieo 
exists  a  deletion  operation  1,  which  deietas  r,  then  in  the 
equivalent  serialised  execution  ere  most  let  f j  be  after  S.  This 
equivalent  serialised  execution  m  legal  (according  to  the  definition 
of  correctness)  ae  long  as  the  last  step  of  It  did  ant  come  before 
the  first  step  of  S  in  our  interleaved  schedule.  Suppose  the  last  step 
of  1,  did  come  before  the  first  step  of  S.  Than  th#  only  way  for  r 
to  still  linger  in  th*  dsrshee*  when  S  starts  is  for  it  to  be  in  ana* 
data  page  p  from  out  of  which  r  was  relocated  (Le.,  via  page  split) 
to  a  different  page  p’,  from  which  !t  deleted  r,  and  p  ia  still  in  a 
transient  state  containing  r.  Hawwver.  if  I,  is  Slushed  by  the  tune  S 
starts,  the  directory  entry  corresponding  to  r  would  have  already 
be  pointing  to  p’.  S  therefore  could  not  possibly  get  treses  to  p. 
Therefore  the  leal  step  of  It  could  not  come  before  the  fim  step  of 
S  in  our  interleaved  schedule.  Therefore  the  equivalent  tenanted 
execution  *  legal.  Therefore  the  search  operation  ie  correct. 

(u)  If  a  search  operation  S  fails,  it  could  fail  incorrectly  only 
when  concurrent  relocation  exists.  In  other  words,  we  want  to 
show  that  if  a  search  operation  S(k)  fails,  and  ’he  /eel  data  page 
read  by  Sfk)  is  p.  then  there  costs  no  insertion  operation  I  such 
that  I  relocates  the  key  range  containing  k  from  p  to  p’  w  p  be/ore 
Sfk)  reads  p. 

Suppose  that  there  exists  such  sn  insertion  operation  I.  Let  d 
be  the  directory  entry  corresponding  to  the  key  k.  Then  I.  before 
finishing,  would  first  write  the  directory  entry  d  and  then  writes  p. 
We  denote  then*  steps  es  W4(f)  and  W,(f).  We  also  denote  th* 
final  steps  of  S(k)  in  reading  directory  d,  reading  page  p,  then  re¬ 
reading  (i.e.,  verifying)  directory  d  as  Rt(S)Jif(S)  and  V^S).  By 
definition  of  the  failed  search  operation,  the  value  read  in  R,)S) 
would  be  equal  to  that  of  V4(5).  There  are  four  cases  of  possible 
interleaving*. 

(1)  W4(/)<\ff4(S)  tad  W,(/)<rt,(S).  In  this  ease,  sine*  I  relo- 
cates  k  from  p  to  p\  th*  directory  entry  read  by  Sfk)  should 
not  contain  a  poin-er  to  p,  therefore  S(k)  would  not  have 
read  p.  contradictory. 

(2)  W,(l)<R,(S)  end  /?,(/)<  W,(5).  In  this  case,  similar  argu- 
ment  a e  above,  S  should  not  have  read  p,  also  contradictory. 


(3)  and  Wf(/)<f?>(5).  In  this  cate,  th*  Vt  step  of 
S(V)  would  have  read  the  new  pointer  (i.e.,  to  p1)  which  is  not 
equal  to  the  old  pointer  (i.e.,to  p)  read  in  the  R,  step,  con¬ 
tradictory. 

(4)  Ri(S)<Wf[I)  and  Rt{S)<Wr(I).  In  this  case,  Sfk)  would 
read  the  old  content  of  page  p  before  I  relocates  k  out  of  p, 
contradictory  to  definition  of  I. 

Therefore  we  conclude  that  there  exists  no  insertion  opera¬ 
tion  I  that  could  have  relocated  k  out  of  p  before  Sfk)  reads  p  at 
its  last  data  page  to  read  before  termination.  Therefore  the  search 
operation  it  correct. 

Combining  (i)  and  (ii)  above  we  conclude  that  the  search 
operation  ia  correct. 

4.3.  Inaertlon/Delation  is  Correct 

Since  search  operations  do  not  update  th*  database,  they 
would  not  affect  th*  correctness  of  an  insertion  operation.  There¬ 
for*  to  prove  that  insertion  operations  are  correct  we  need  only  to 
take  into  account  interferences  among  insertion  operations  them¬ 
selves,  and  between  insertion  sad  deletion. 

W*  introduce  some  notations  to  refer  to  specific  sups  of  sn 
inssrtion/dsistiott  operation*.  Wt  are  interested  in  th*  tailing  end 
of  th*  sup*  in  these  operations,  i.*.,  thou  in  th*  final  round  of  th* 
verification  loop  and  those  at  th*  very  end-  Th*  sequence  of  th* 
read/wriu  sups  of  th*  Itti  round  of  th*  verification  loop  of  an 
insertion /deletion  eonaiau  of  <RitL,, V4_>,  where  L,  stands  for 
exclusive  lock  of  p,  Vt  stands  for  th*  sup  of  verifying  th*  eonunt 
of  the  directory  entry  read  in  Rt.  W*  denote  R{  and  Vt  in  this 
last  round  of  verification  as  R '  i  and  V  t .  Nou  that  by  definition, 
th*  eonunt  of  the  directory  entry  read  in  R' ,  and  V'<  must  be 
identical.  After  the  last  round  of  verification,  th*  page  pointed  to 
by  the  vain#  read  in  R' t  ia  read.  W*  denote  this  sup  as  R 
Th#  final  sequence  of  step*  of  an  insertion /deletion  operation  that 
dnea  not  involve  a  split  is  <Wf,Ut>,  where  Wr  and  Ut  stand  for 
wriu  sad  unlock  of  th*  page  p  which  wa*  locked  between  R  ,  and 
V‘  t  during  th*  lest  round  of  verification.  The  sequence  for  on* 
involving  a  split  ia  <WltU,Wp,Ur>.  wh era  U  unlocks  ail  new 
pages,  and  W{  ia  th*  last  directory  entry  update.  W*  will  denote 
these  last  sups  of  directory  updau  and  page  wnu  as  W'4  and 
W‘ Nou  that  th*  directory  entry  written  in  W4  may  not  be 
the  same  entry  read  is  fi'(  or  V'  ,. 

Dt) rtaitiea.  Let  I  be  an  insertion/ deletion  operation.  We 
define  th*  rang*  of  the  keys  reiocaiea  by  1  as  the  mtfrtntn  set  of 
L  dtooud  as  netfreuee  ( /). 

Since  th*  deletion  operation  sever  relocate*  any  record,  it* 
migration  set  ■  obviously  empty 

Ltmmt  3.  Any  two  concurrent  insertion/ deletion  operation* 
/,  and  /,  aiway*  inurieav*  correctly. 

Prttf.  Let  th*  kry  to  be  operaud  by  / ,  be  i,  and  that  by  /, 
is  i,.  Assume  without  loss  of  generality  R  ,[D<R  ,v' 

consider  th*  following  cases,  and  for  cacn  case  we  snow  that  they 
inurieav*  correctly 

(l)  mifr*<i*n(/,)  contain*  kT  Then  /,  must  updau  the  direc¬ 

tory  entry  for  i*,  denoted  u  ft,  that  f*  needs  to  read.  Two 
tube aa*a  are  considered,  (i)  It  read#  ft,  in  the  final  round 
tfltr  fj  updates  it.  ;i.*.,  W4(  (/,)<>?  4(/j)  where  ^t,{f i)  i» 
th*  step  in  which  I\  update*  ft,)  Then  fj  cannot  read  the 
page  pointed  to  by  ft,  it  read  unttl  / 1  release*  th*  lock  on  it, 
by  which  time  f,  would  hay*  finished  all  it#  operation*  on 
directories.  Therefor*  the  only  dependency  that  the 
directory  entry  operations  can  possibly  induce  between  I\ 
and  It  are  f,  giving  to  fy.  Since  Iy  will  not  read  or  writ*  any 
data  pages  after  It  writes  them,  the  only  dependency  that 
the  data  page  operation*  can  induce  are  aiso  /,  giving  to  I,. 
Therefore  any  murieaving  between  f,  and  It  a  equivalent  to 


Mrialiiing  /,  before  therefore  they  «  coma,  (ii) 
^t,y i)->^  i(/j).  i-c.,  /,  reads  before  /,  updates  it.  In 

thic  cm«  /j  will  be  forced  to  wait  till  /,  releases  iU  lock  on 
the  page  it  it  splitting,  by  which  time  W.  (/,)  would  have 

already  occurred,  which  means  V ' w ( /,)  would  have  failed, 
contradictory. 

(2)  mif  ration  (/,)  does  not  contain  kv  There  are  also  two  sub¬ 

cases.  (i)  mi>rstion(/,)  contains  it.  Let  the  page  read  in 
R 'tV\)  ?■  A  holds  a  lock  on  p  till  finish.  Since 

R  »(/|)  <R  It  can  read  p  (if  it  ever  does)  only  after  f, 

is  finished.  Therefore  the  only  possible  dependency  is  f,  giv¬ 
ing  to  /],  therefore  the  interleaving  is  correct,  (ii) 
migration (//j  does  not  contain  i,.  In  thie  case  no  conAict 
can  occur  between  /,  and  on  directory  entries.  And  since 
data  pages  are  two-phass  locked,  the  interleaving  must  be 
correct. 

From  the  above  three  lemmas,  one  concludes  that  our  algo¬ 
rithms  for  concurrent  sc  arch /insertion /deletion  operations  are 
correct.  Q.E.D. 

S.  Conclusion 

We  have  presented  an  algorithm  for  synchronising  concurrent 
operations  in  extendible  hath  Ales.  The  algorithm  allows  the 
search  operations  to  proceed  concurrently  with  insertion  operations 
without  baring  to  acquire  locks  on  the  directory  entries  or  the  data 
pages.  It  also  allows  concurrent  insertion /deletion  operations  to 
proceed  without  having  to  acquire  locks  on  the  directory  entries. 
Moreover,  because  at  moat  a  single  lock  is  required  at  any  time  for 
each  of  these  operations,  the  algorithm  is  deadlock  free.  The  algo¬ 
rithm  combines  the  method  of  verification  used  in  tht  optimistic 
concurrency  control  algorithm  and  the  special  structures  of  opera¬ 
tions  in  extendible  hash  files  together  to  yield  a  higher  level  of  con¬ 
currency  as  well  ee  a  lower  synchronisation  overhead. 

In  thie  paper  we  ignore  the  issues  of  underflow  and  compac¬ 
tion.  We  also  did  not  discuss  the  issue  of  directory  expansion  (Le., 
doubling)  extensively.  However,  the  latter  can  be  handled  by  a 
straightforward  extension  of  the  current  algorithm,  to  require  that 
(1)  every  time  a  verification  (i.e.,  re-read)  of  the  content  of  the 
directory  entry  is  performed,  the  global  depth  and  the  base  address 
of  the  directory  are  also  re-read,  and  that  (i)  the  old  version  of  the 
directory  is  earned  around  in  memory  for  a  specified  period  of 
time.  (Incidentally,  (2)  can  be  relaxed  if  the  bice  in  the  pseudo  key 
used  to  index  into  the  directory  are  the  suffix  rather  than  the 
prefix  of  the  paeudo  key.)  If  these  prove  to  be  practical  to  imple¬ 
ment,  directory  expanaon  can  be  allowed  la  proceed  concurrently 
with  March  operations.  In  any  caaa,  dstanaaa  quiescence  can 
always  be  resorted  to  as  the  method  for  handling  directory  expan¬ 
sion. 

The  algorithm  can  siao  be  applied  to  handle  dynamic  perfect 
hash  Ales  IYD84|.  The  dynamic  perfect  hash  Ale  structure  employs 
a  method  that  optimises  the  space  requirement  of  the  directory 
used  in  an  extendible  hash  file,  thus  rendering  it  more  practical  to 
consider  tht  directory  being  memory  resident.  However,  the  struc¬ 
ture  of  the  directory  in  a  dynamic  perfect  hash  file  is  more  compli¬ 
cated  than  that  of  an  ordinary  extendible  hash  Ale,  and  txtenmona 
of  the  current  algorithm  mutt  be  sought  for. 
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